Skip to content

[pull] master from ggml-org:master#196

Merged
pull[bot] merged 1 commit into
CrazyForks:masterfrom
ggml-org:master
Jun 28, 2026
Merged

[pull] master from ggml-org:master#196
pull[bot] merged 1 commit into
CrazyForks:masterfrom
ggml-org:master

Conversation

@pull

@pull pull Bot commented Jun 28, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* opencl: rework FA kernel for f16 and f32

* opencl: flash-attention prefill prepass kernels

- flash_attn_kv_pad_f16    pads the tail KV tile to a BLOCK_N multiple
- flash_attn_mask_pad_f16  pads the matching mask tile
- flash_attn_blk_f16       classifies each KV tile per query block as
                           fully masked / mixed / fully unmasked, so
                           the main kernel can skip fully-masked tiles
                           and the mask lookup for fully-unmasked ones

* opencl: FA kernels for q4_0 and q8_0

* opencl: `set_rows` for f32 to q8_0/q4_0

* opencl: dequant kernels for q4_0 and q8_0

* opencl: add FA tile tuning table with override

* opencl: wire host side for FA

* opencl: q4_0 MoE tensors are also SOA'ed

* opencl: cosmetic fix

* opencl: refactor, also clarify some code paths in comments

* opencl: fix inifity for `-cl-finite-math-only`

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
@pull pull Bot locked and limited conversation to collaborators Jun 28, 2026
@pull pull Bot added the ⤵️ pull label Jun 28, 2026
@pull pull Bot merged commit ebd048f into CrazyForks:master Jun 28, 2026
7 of 25 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant